AITopics | Observational Study

Collaborating Authors

Observational Study

Proximal Path-Specific Inference

Bai, Yang, Wu, Sihan, Sun, Baoluo, Cui, Yifan

arXiv.org Machine LearningMay-12-2026

Mediation analysis (Robins & Greenland 1992, Pearl 2001, Imai, Keele & Tingley 2010, Tchetgen Tchetgen & Shpitser 2012) provides a principled framework for investigating causal mechanisms by decomposing the effect of a treatment A on an outcome Y into pathways operating through a mediator of interest M. Classical mediation analysis focuses on the natural indirect effect, corresponding to the pathway from Ato Y through M, and the natural direct effect, corresponding to pathways not through M. These estimands are well understood when a single mediator is present and strong identification assumptions hold. However, in many applications, there exist multiple intermediate variables between treatment and outcome. In such settings, conventional mediation analysis typically requires the absence of treatment-induced mediator-outcome confounders--often referred to as recanting witnesses--as well as the absence of unmeasured confounding. Under these circumstances, commonly used identification assumptions such as sequential ignorability (Imai, Keele & Yamamoto 2010) or nonparametric structural equation models with independent errors (NPSEM-IE) (Pearl 2009) no longer suffice to identify natural indirect effects (Avin et al. 2005, Tchetgen Tchetgen & VanderWeele 2014). Figure 1 illustrates this issue: the recanting witness D is directly affected by A and simultaneously confounds the relationship between M and Y. Such treatment-induced confounding is common in epidemiologic studies, particularly when the mediator of interest occurs long after the treatment initiation (Robins 1999). A motivating example arises in studies of preterm birth. Mediation analysis has been widely used to explore whether adequate prenatal care (A) reduces the risk of preterm birth (Y) through preeclampsia (M) (Vansteelandt & VanderWeele 2012, VanderWeele et al. 2014, Xia & Chan 2023).

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

2605.09462

Country: North America > United States > California (0.28)

Genre:

Research Report > Strength Medium (0.48)
Research Report > Observational Study (0.48)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Public Health (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.68)

Add feedback

dea9ddb25cbf2352cf4dec30222a02a5-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 12:45:39 GMT

artificial intelligence, estimator, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Genre:

Research Report > Strength Medium (0.64)
Research Report > Experimental Study (0.46)
Research Report > Observational Study (0.40)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Identification and Estimation of Joint Probabilitiesof Potential Outcomes in Observational Studies with Covariate Information

Neural Information Processing SystemsFeb-11-2026, 12:45:35 GMT

However, because they are not identifiable without any assumptions, various assumptions have been utilized to evaluate the joint probabilities of potential outcomes, e.g., the assumption of monotonicity (Pearl, 2009; Tian and Pearl, 2000), the independence between potential outcomes (Robins and Richardson, 2011), the condition of gain equality (Li and Pearl, 2019), and the specific functional relationshipsbetween cause and effect (Pearl, 2009). Unlike existing identification conditions, in order to evaluate the joint probabilities of potential outcomeswithoutsuch assumptions,this paper proposestwo types of novel identification conditions using covariate information. In addition, when the joint probabilities of potential outcomes are identifiable through the proposed conditions, the estimation problem of the joint probabilities of potential outcomes reduces to that of singular models and thus they can not be evaluated by standard statistical estimation methods. To solve the problem,this paper proposes a new statisticalestimationmethod based on the augmented Lagrangianmethod and shows the asymptoticnormality of the proposed estimators. Given space constraints, the proofs, the details on the statistical estimationmethod, some numerical experiments, and the case study are provided in the supplementary material.

artificial intelligence, joint probability, probability, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Greenland (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre:

Research Report > Strength Medium (0.64)
Research Report > Observational Study (0.64)

Industry:

Health & Medicine > Epidemiology (0.93)
Law (0.93)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Identification and Estimation of Joint Probabilities of Potential Outcomes in Observational Studies with Covariate Information

Neural Information Processing SystemsAug-18-2025, 00:15:14 GMT

"sufficiency", and "necessity and sufficiency", which are important concepts In practical science, it is crucial to evaluate the likelihood of one event causing another event. For example, epidemiologists pay attention to determining the likelihood of a particular exposure being the cause of a particular disease.

artificial intelligence, joint probability, probability, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Greenland (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre:

Research Report > Strength Medium (0.64)
Research Report > Observational Study (0.64)
Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Epidemiology (1.00)
Law (0.93)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Why can't Epidemiology be automated (yet)?

Bann, David, Lowther, Ed, Wright, Liam, Kovalchuk, Yevgeniya

arXiv.org Artificial IntelligenceJul-22-2025

Recent advances in artificial intelligence (AI) - particularly generative AI - present new opportunities to accelerate, or even automate, epidemiological research. Unlike disciplines based on physical experimentation, a sizable fraction of Epidemiology relies on secondary data analysis and thus is well-suited for such augmentation. Yet, it remains unclear which specific tasks can benefit from AI interventions or where roadblocks exist. Awareness of current AI capabilities is also mixed. Here, we map the landscape of epidemiological tasks using existing datasets - from literature review to data access, analysis, writing up, and dissemination - and identify where existing AI tools offer efficiency gains. While AI can increase productivity in some areas such as coding and administrative tasks, its utility is constrained by limitations of existing AI models (e.g. hallucinations in literature reviews) and human systems (e.g. barriers to accessing datasets). Through examples of AI-generated epidemiological outputs, including fully AI-generated papers, we demonstrate that recently developed agentic systems can now design and execute epidemiological analysis, albeit to varied quality (see https://github.com/edlowther/automated-epidemiology). Epidemiologists have new opportunities to empirically test and benchmark AI systems; realising the potential of AI will require two-way engagement between epidemiologists and engineers.

epidemiology, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.15617

Country: Europe > United Kingdom > England (0.28)

Genre:

Research Report > Strength Medium (0.68)
Research Report > Observational Study (0.68)
Research Report > Experimental Study (0.47)

Industry: Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Acoustic Index: A Novel AI-Driven Parameter for Cardiac Disease Risk Stratification Using Echocardiography

Begiashvili, Beka, Fernandez-Candel, Carlos J., Paredes, Matías Pérez

arXiv.org Artificial IntelligenceJul-21-2025

Traditional echocardiographic parameters such as ejection fraction (EF) and global longitudinal strain (GLS) have limitations in the early detection of cardiac dysfunction. EF often remains normal despite underlying pathology, and GLS is influenced by load conditions and vendor variability. There is a growing need for reproducible, interpretable, and operator-independent parameters that capture subtle and global cardiac functional alterations. We introduce the Acoustic Index, a novel AI-derived echocardiographic parameter designed to quantify cardiac dysfunction from standard ultrasound views. The model combines Extended Dynamic Mode Decomposition (EDMD) based on Koopman operator theory with a hybrid neural network that incorporates clinical metadata. Spatiotemporal dynamics are extracted from echocardiographic sequences to identify coherent motion patterns. These are weighted via attention mechanisms and fused with clinical data using manifold learning, resulting in a continuous score from 0 (low risk) to 1 (high risk). In a prospective cohort of 736 patients, encompassing various cardiac pathologies and normal controls, the Acoustic Index achieved an area under the curve (AUC) of 0.89 in an independent test set. Cross-validation across five folds confirmed the robustness of the model, showing that both sensitivity and specificity exceeded 0.8 when evaluated on independent data. Threshold-based analysis demonstrated stable trade-offs between sensitivity and specificity, with optimal discrimination near this threshold. The Acoustic Index represents a physics-informed, interpretable AI biomarker for cardiac function. It shows promise as a scalable, vendor-independent tool for early detection, triage, and longitudinal monitoring. Future directions include external validation, longitudinal studies, and adaptation to disease-specific classifiers.

acoustic index, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.13542

Country: Europe > Spain (0.14)

Genre:

Research Report > Experimental Study (0.48)
Research Report > Strength Medium (0.34)
Research Report > Observational Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness

Cai, Yang, Kalavasis, Alkis, Mamali, Katerina, Mehrotra, Anay, Zampetakis, Manolis

arXiv.org Machine LearningJul-1-2025

Most of the widely used estimators of the average treatment effect (ATE) in causal inference rely on the assumptions of unconfoundedness and overlap. Unconfoundedness requires that the observed covariates account for all correlations between the outcome and treatment. Overlap requires the existence of randomness in treatment decisions for all individuals. Nevertheless, many types of studies frequently violate unconfoundedness or overlap, for instance, observational studies with deterministic treatment decisions - popularly known as Regression Discontinuity designs - violate overlap. In this paper, we initiate the study of general conditions that enable the identification of the average treatment effect, extending beyond unconfoundedness and overlap. In particular, following the paradigm of statistical learning theory, we provide an interpretable condition that is sufficient and necessary for the identification of ATE. Moreover, this condition also characterizes the identification of the average treatment effect on the treated (ATT) and can be used to characterize other treatment effects as well. To illustrate the utility of our condition, we present several well-studied scenarios where our condition is satisfied and, hence, we prove that ATE can be identified in regimes that prior works could not capture. For example, under mild assumptions on the data distributions, this holds for the models proposed by Tan (2006) and Rosenbaum (2002), and the Regression Discontinuity design model introduced by Thistlethwaite and Campbell (1960). For each of these scenarios, we also show that, under natural additional assumptions, ATE can be estimated from finite samples. We believe these findings open new avenues for bridging learning-theoretic insights and causal inference methodologies, particularly in observational studies with complex treatment mechanisms.

artificial intelligence, condition 1, machine learning, (15 more...)

arXiv.org Machine Learning

2506.04194

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Indonesia > Sumatra (0.04)

Genre:

Research Report > Strength Medium (1.00)
Research Report > Experimental Study (1.00)
Research Report > Observational Study (0.87)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Uncovering Bias Mechanisms in Observational Studies

Demirel, Ilker, Hussain, Zeshan, De Bartolomeis, Piersilvio, Sontag, David

arXiv.org Machine LearningJun-3-2025

Observational studies are a key resource for causal inference but are often affected by systematic biases. Prior work has focused mainly on detecting these biases, via sensitivity analyses and comparisons with randomized controlled trials, or mitigating them through debiasing techniques. However, there remains a lack of methodology for uncovering the underlying mechanisms driving these biases, e.g., whether due to hidden confounding or selection of participants. In this work, we show that the relationship between bias magnitude and the predictive performance of nuisance function estimators (in the observational study) can help distinguish among common sources of causal bias. We validate our methodology through extensive synthetic experiments and a real-world case study, demonstrating its effectiveness in revealing the mechanisms behind observed biases. Our framework offers a new lens for understanding and characterizing bias in observational studies, with practical implications for improving causal inference.

artificial intelligence, machine learning, mechanism, (17 more...)

arXiv.org Machine Learning

2506.01191

Country:

North America > Canada (0.14)
Asia > Middle East > Jordan (0.04)
North America > Greenland (0.04)
(2 more...)

Genre:

Research Report > Strength Medium (1.00)
Research Report > Strength High (1.00)
Research Report > Observational Study (1.00)
(2 more...)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Proximal Inference on Population Intervention Indirect Effect

Bai, Yang, Cui, Yifan, Sun, Baoluo

arXiv.org Machine LearningApr-16-2025

Additionally, experiments have shown that depersonalization symptoms can arise as a reaction to alcohol consumption (Raimo et al., 1999), and they are increasingly recognized as a significant prognostic factor in the course of depression (Michal et al., 2024). Despite these findings, little research has explored the mediating role of depersonalization symptoms in the causal pathway from alcohol consumption to depression. In this paper, we propose a methodological framework to evaluate the indirect effect of alcohol consumption on depression, with depersonalization acting as a mediator. To ground our analysis, we use data from a cross-sectional survey conducted during the COVID-19 pandemic by Dom ınguez-Espinosa et al. (2023) as a running example. In observational studies, the population average causal effect (ACE) and the natural indirect effect (NIE) are the most commonly used measures of total and mediation effects, respectively, to compare the outcomes of different intervention policies. For instance, in our running example, these two measures compare the depression outcomes between individuals engaging in hazardous versus non-hazardous alcohol consumption. However, clinical practice imposes ethical constraints, as healthcare professionals would not prescribe harmful levels of alcohol consumption. As a result, hypothetical interventions involving dangerous exposure levels are unrealistic. To address this situation with potentially harmful exposure, Hubbard and Van der Laan (2008) propose the population intervention effect (PIE), which contrasts outcomes between the natural population and a hypothetical population where no one is exposed to the harmful exposure level.

artificial intelligence, bridge function, machine learning, (16 more...)

arXiv.org Machine Learning

2504.11848

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre:

Research Report > Strength Medium (0.66)
Research Report > Observational Study (0.66)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Epidemiology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Causal Interpretations in Observational Studies: The Role of Sociocultural Backgrounds and Team Dynamics

Wang, Jun, Yu, Bei

arXiv.org Artificial IntelligenceFeb-3-2025

The prevalence of drawing causal conclusions from observational studies has raised concerns about potential exaggeration in science communication. While some believe causal language should only apply to randomized controlled trials, others argue that rigorous methods can justify causal claims in observational studies. Ideally, causal language should align with the strength of the evidence. However, through the analysis of over 80,000 observational study abstracts using computational linguistic and regression methods, we found that causal language is more frequently used by less experienced authors, smaller research teams, male last authors, and authors from countries with higher uncertainty avoidance indices. These findings suggest that the use of causal language may be influenced by external factors such as the sociocultural backgrounds of authors and the dynamics of research collaboration. This newly identified link deepens our understanding of how such factors help shape scientific conclusions in causal inference and science communication.

artificial intelligence, causal language, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.12159

Country:

Asia > Taiwan (0.05)
Asia > South Korea (0.05)
Asia > China (0.05)
(25 more...)

Genre:

Research Report > Strength Medium (1.00)
Research Report > Observational Study (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Media (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback